Search for: All records

Editors contains: "Guruswami, Venkatesan"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

https://doi.org/10.4230/LIPIcs.ITCS.2024.93

Song, Zhao; Zhang, Lichen; Zhang, Ruizhe (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).
more » « less
Full Text Available
On the (In)approximability of Combinatorial Contracts. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

https://doi.org/10.4230/LIPIcs.ITCS.2024.44

Ezra, Tomer; Feldman, Michal; Schlesinger, Maya (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
We study two recent combinatorial contract design models, which highlight different sources of complexity that may arise in contract design, where a principal delegates the execution of a costly project to others. In both settings, the principal cannot observe the choices of the agent(s), only the project’s outcome (success or failure), and incentivizes the agent(s) using a contract, a payment scheme that specifies the payment to the agent(s) upon a project’s success. We present results that resolve open problems and advance our understanding of the computational complexity of both settings. In the multi-agent setting, the project is delegated to a team of agents, where each agent chooses whether or not to exert effort. A success probability function maps any subset of agents who exert effort to a probability of the project’s success. For the family of submodular success probability functions, Dütting et al. [2023] established a poly-time constant factor approximation to the optimal contract, and left open whether this problem admits a PTAS. We answer this question on the negative, by showing that no poly-time algorithm guarantees a better than 0.7-approximation to the optimal contract. For XOS functions, they give a poly-time constant approximation with value and demand queries. We show that with value queries only, one cannot get any constant approximation. In the multi-action setting, the project is delegated to a single agent, who can take any subset of a given set of actions. Here, a success probability function maps any subset of actions to a probability of the project’s success. Dütting et al. [2021a] showed a poly-time algorithm for computing an optimal contract for gross substitutes success probability functions, and showed that the problem is NP-hard for submodular functions. We further strengthen this hardness result by showing that this problem does not admit any constant factor approximation. Furthermore, for the broader class of XOS functions, we establish the hardness of obtaining a n^{-1/2+ε}-approximation for any ε > 0.
more » « less
Full Text Available
Leibniz International Proceedings in Informatics (LIPIcs):15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

https://doi.org/10.4230/LIPIcs.ITCS.2024.69

Klein, Ohad; Slote, Joseph; Volberg, Alexander; Zhang, Haonan (January 2024, 15th Innovations in Theoretical Computer Science Conference (ITCS 2024).)
Guruswami, Venkatesan (Ed.)
Recent efforts in Analysis of Boolean Functions aim to extend core results to new spaces, including to the slice binom([n],k), the hypergrid [K]ⁿ, and noncommutative spaces (matrix algebras). We present here a new way to relate functions on the hypergrid (or products of cyclic groups) to their harmonic extensions over the polytorus. We show the supremum of a function f over products of the cyclic group {exp(2π i k/K)}_{k = 1}^K controls the supremum of f over the entire polytorus ({z ∈ ℂ:|z| = 1}ⁿ), with multiplicative constant C depending on K and deg(f) only. This Remez-type inequality appears to be the first such estimate that is dimension-free (i.e., C does not depend on n). This dimension-free Remez-type inequality removes the main technical barrier to giving 𝒪(log n) sample complexity, polytime algorithms for learning low-degree polynomials on the hypergrid and low-degree observables on level-K qudit systems. In particular, our dimension-free Remez inequality implies new Bohnenblust-Hille-type estimates which are central to the learning algorithms and appear unobtainable via standard techniques. Thus we extend to new spaces a recent line of work [Eskenazis and Ivanisvili, 2022; Huang et al., 2022; Volberg and Zhang, 2023] that gave similarly efficient methods for learning low-degree polynomials on the hypercube and observables on qubits. An additional product of these efforts is a new class of distributions over which arbitrary quantum observables are well-approximated by their low-degree truncations - a phenomenon that greatly extends the reach of low-degree learning in quantum science [Huang et al., 2022].
more » « less
Full Text Available
Fraud Detection for Random Walks

https://doi.org/10.4230/LIPIcs.ITCS.2024.36

Dani, Varsha; Hayes, Thomas P; Pettie, Seth; Saia, Jared (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
Traditional fraud detection is often based on finding statistical anomalies in data sets and transaction histories. A sophisticated fraudster, aware of the exact kinds of tests being deployed, might be difficult or impossible to catch. We are interested in paradigms for fraud detection that are provably robust against any adversary, no matter how sophisticated. In other words, the detection strategy should rely on signals in the data that are inherent in the goals the adversary is trying to achieve. Specifically, we consider a fraud detection game centered on a random walk on a graph. We assume this random walk is implemented by having a player at each vertex, who can be honest or not. In particular, when the random walk reaches a vertex owned by an honest player, it proceeds to a uniformly random neighbor at the next timestep. However, when the random walk reaches a dishonest player, it instead proceeds to an arbitrary neighbor chosen by an omniscient Adversary. The game is played between the Adversary and a Referee who sees the trajectory of the random walk. At any point during the random walk, if the Referee determines that a {specific} vertex is controlled by a dishonest player, the Referee accuses that player, and therefore wins the game. The Referee is allowed to make the occasional incorrect accusation, but must follow a policy that makes such mistakes with small probability of error. The goal of the adversary is to make the cover time large, ideally infinite, i.e., the walk should never reach at least one vertex. We consider the following basic question: how much can the omniscient Adversary delay the cover time without getting caught? Our main result is a tight upper bound on this delay factor. We also discuss possible applications of our results to settings such as Rotor Walks, Leader Election, and Sybil Defense.
more » « less
Full Text Available
Recursive Error Reduction for Regular Branching Programs

https://doi.org/10.4230/LIPIcs.ITCS.2024.29

Chattopadhyay, Eshan; Liao, Jyun-Jie (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
In a recent work, Chen, Hoza, Lyu, Tal and Wu (FOCS 2023) showed an improved error reduction framework for the derandomization of regular read-once branching programs (ROBPs). Their result is based on a clever modification to the inverse Laplacian perspective of space-bounded derandomization, which was originally introduced by Ahmadinejad, Kelner, Murtagh, Peebles, Sidford and Vadhan (FOCS 2020). In this work, we give an alternative error reduction framework for regular ROBPs. Our new framework is based on a binary recursive formula from the work of Chattopadhyay and Liao (CCC 2020), that they used to construct weighted pseudorandom generators (WPRGs) for general ROBPs. Based on our new error reduction framework, we give alternative proofs to the following results for regular ROBPs of length n and width w, both of which were proved in the work of Chen et al. using their error reduction: - There is a WPRG with error ε that has seed length Õ(log(n)(√{log(1/ε)}+log(w))+log(1/ε)). - There is a (non-black-box) deterministic algorithm which estimates the expectation of any such program within error ±ε with space complexity Õ(log(nw)⋅log log(1/ε)). This was first proved in the work of Ahmadinejad et al., but the proof by Chen et al. is simpler. Because of the binary recursive nature of our new framework, both of our proofs are based on a straightforward induction that is arguably simpler than the Laplacian-based proof in the work of Chen et al. In fact, because of its simplicity, our proof of the second result directly gives a slightly stronger claim: our algorithm computes a ε-singular value approximation (a notion of approximation introduced in a recent work by Ahmadinejad, Peebles, Pyne, Sidford and Vadhan (FOCS 2023)) of the random walk matrix of the given ROBP in space Õ(log(nw)⋅log log(1/ε)). It is not clear how to get this stronger result from the previous proofs.
more » « less
Full Text Available
The Non-Uniform Perebor Conjecture for Time-Bounded Kolmogorov Complexity Is False

https://doi.org/10.4230/LIPIcs.ITCS.2024.80

Mazor, Noam; Pass, Rafael (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
The Perebor (Russian for "brute-force search") conjectures, which date back to the 1950s and 1960s are some of the oldest conjectures in complexity theory. The conjectures are a stronger form of the NP ≠ P conjecture (which they predate) and state that for "meta-complexity" problems, such as the Time-bounded Kolmogorov complexity Problem, and the Minimum Circuit Size Problem, there are no better algorithms than brute force search. In this paper, we disprove the non-uniform version of the Perebor conjecture for the Time-Bounded Kolmogorov complexity problem. We demonstrate that for every polynomial t(⋅), there exists of a circuit of size 2^{4n/5+o(n)} that solves the t(⋅)-bounded Kolmogorov complexity problem on every instance. Our algorithm is black-box in the description of the Universal Turing Machine U employed in the definition of Kolmogorov Complexity and leverages the characterization of one-way functions through the hardness of the time-bounded Kolmogorov complexity problem of Liu and Pass (FOCS'20), and the time-space trade-off for one-way functions of Fiat and Naor (STOC'91). We additionally demonstrate that no such black-box algorithm can have circuit size smaller than 2^{n/2-o(n)}. Along the way (and of independent interest), we extend the result of Fiat and Naor and demonstrate that any efficiently computable function can be inverted (with probability 1) by a circuit of size 2^{4n/5+o(n)}; as far as we know, this yields the first formal proof that a non-trivial circuit can invert any efficient function.
more » « less
Full Text Available
New Lower Bounds in Merlin-Arthur Communication and Graph Streaming Verification

https://doi.org/10.4230/LIPIcs.ITCS.2024.53

Ghosh, Prantar; Shah, Vihan (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
We present novel lower bounds in the Merlin-Arthur (MA) communication model and the related annotated streaming or stream verification model. The MA communication model extends the classical communication model by introducing an all-powerful but untrusted player, Merlin, who knows the inputs of the usual players, Alice and Bob, and attempts to convince them about the output. We focus on the online MA (OMA) model where Alice and Merlin each send a single message to Bob, who needs to catch Merlin if he is dishonest and announce the correct output otherwise. Most known functions have OMA protocols with total communication significantly smaller than what would be needed without Merlin. In this work, we introduce the notion of non-trivial-OMA complexity of a function. This is the minimum total communication required when we restrict ourselves to only non-trivial protocols where Alice sends Bob fewer bits than what she would have sent without Merlin. We exhibit the first explicit functions that have this complexity superlinear - even exponential - in their classical one-way complexity: this means the trivial protocol, where Merlin communicates nothing and Alice and Bob compute the function on their own, is exponentially better than any non-trivial protocol in terms of total communication. These OMA lower bounds also translate to the annotated streaming model, the MA analogue of single-pass data streaming. We show large separations between the classical streaming complexity and the non-trivial annotated streaming complexity (for the analogous notion in this setting) of fundamental problems such as counting distinct items, as well as of graph problems such as connectivity and k-connectivity in a certain edge update model called the support graph turnstile model that we introduce here.
more » « less
Full Text Available
Leibniz International Proceedings in Informatics (LIPIcs):15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

https://doi.org/10.4230/LIPICS.ITCS.2024.62

Henzinger, Monika; Saha, Barna; Seybold, Martin P.; Ye, Christopher (January 2024, 15th Innovations in Theoretical Computer Science Conference, {ITCS})
Guruswami, Venkatesan (Ed.)
Algorithms with predictions is a new research direction that leverages machine learned predictions for algorithm design. So far a plethora of recent works have incorporated predictions to improve on worst-case bounds for online problems. In this paper, we initiate the study of complexity of dynamic data structures with predictions, including dynamic graph algorithms. Unlike online algorithms, the goal in dynamic data structures is to maintain the solution efficiently with every update. We investigate three natural models of prediction: (1) δ-accurate predictions where each predicted request matches the true request with probability δ, (2) list-accurate predictions where a true request comes from a list of possible requests, and (3) bounded delay predictions where the true requests are a permutation of the predicted requests. We give general reductions among the prediction models, showing that bounded delay is the strongest prediction model, followed by list-accurate, and δ-accurate. Further, we identify two broad problem classes based on lower bounds due to the Online Matrix Vector (OMv) conjecture. Specifically, we show that locally correctable dynamic problems have strong conditional lower bounds for list-accurate predictions that are equivalent to the non-prediction setting, unless list-accurate predictions are perfect. Moreover, we show that locally reducible dynamic problems have time complexity that degrades gracefully with the quality of bounded delay predictions. We categorize problems with known OMv lower bounds accordingly and give several upper bounds in the delay model that show that our lower bounds are almost tight. We note that concurrent work by v.d.Brand et al. [SODA '24] and Liu and Srinivas [arXiv:2307.08890] independently study dynamic graph algorithms with predictions, but their work is mostly focused on showing upper bounds.
more » « less
Leibniz International Proceedings in Informatics (LIPIcs):15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

https://doi.org/10.4230/LIPIcs.ITCS.2024.81

Milionis, Jason; Moallemi, Ciamac C; Roughgarden, Tim (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
In decentralized finance ("DeFi"), automated market makers (AMMs) enable traders to programmatically exchange one asset for another. Such trades are enabled by the assets deposited by liquidity providers (LPs). The goal of this paper is to characterize and interpret the optimal (i.e., profit-maximizing) strategy of a monopolist liquidity provider, as a function of that LP’s beliefs about asset prices and trader behavior. We introduce a general framework for reasoning about AMMs based on a Bayesian-like belief inference framework, where LPs maintain an asset price estimate, which is updated by incorporating traders' price estimates. In this model, the market maker (i.e., LP) chooses a demand curve that specifies the quantity of a risky asset to be held at each dollar price. Traders arrive sequentially and submit a price bid that can be interpreted as their estimate of the risky asset price; the AMM responds to this submitted bid with an allocation of the risky asset to the trader, a payment that the trader must pay, and a revised internal estimate for the true asset price. We define an incentive-compatible (IC) AMM as one in which a trader’s optimal strategy is to submit its true estimate of the asset price, and characterize the IC AMMs as those with downward-sloping demand curves and payments defined by a formula familiar from Myerson’s optimal auction theory. We generalize Myerson’s virtual values, and characterize the profit-maximizing IC AMM. The optimal demand curve generally has a jump that can be interpreted as a "bid-ask spread," which we show is caused by a combination of adverse selection risk (dominant when the degree of information asymmetry is large) and monopoly pricing (dominant when asymmetry is small). This work opens up new research directions into the study of automated exchange mechanisms from the lens of optimal auction theory and iterative belief inference, using tools of theoretical computer science in a novel way.
more » « less
Communicating with Anecdotes (Extended Abstract); Leibniz International Proceedings in Informatics (LIPIcs):15th Innovations in Theoretical Computer Science Conference (ITCS 2024)

https://doi.org/10.4230/LIPIcs.ITCS.2024.57

Haghtalab, Nika; Immorlica, Nicole; Lucier, Brendan; Mobius, Markus; Mohan, Divyarthi (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
We study a communication game between a sender and receiver. The sender chooses one of her signals about the state of the world (i.e., an anecdote) and communicates it to the receiver who takes an action affecting both players. The sender and receiver both care about the state of the world but are also influenced by personal preferences, so their ideal actions can differ. We characterize perfect Bayesian equilibria. The sender faces a temptation to persuade: she wants to select a biased anecdote to influence the receiver’s action. Anecdotes are still informative to the receiver (who will debias at equilibrium) but the attempt to persuade comes at the cost of precision. This gives rise to informational homophily where the receiver prefers to listen to like-minded senders because they provide higher-precision signals. Communication becomes polarized when the sender is an expert with access to many signals, with the sender choosing extreme outlier anecdotes at equilibrium (unless preferences are perfectly aligned). This polarization dissipates all the gains from communication with an increasingly well-informed sender when the anecdote distribution is heavy-tailed. Experts therefore face a curse of informedness: receivers will prefer to listen to less-informed senders who cannot pick biased signals as easily.
more » « less

« Prev Next »